A Method for Extracting Information from the Web Using Deep Learning Algorithm
نویسندگان
چکیده
Web mining related research are getting more important now a days because of the reason that large amount of data are managed through internet. The web usage is increasing in an uncontrolled manner. A specific system is needed for controlling such large amount of data in the web space. The web mining is classified into three major divisions that are web content mining, web usage mining and web structure mining. In this paper, we propose a web content mining approach based on a deep learning algorithm. The deep learning algorithm provides the advantage over Bayesian networks because Bayesian network is not following in any learning architecture like proposed technique. In the proposed approach, three features are considered for extracting the web content. The features used are concept feature, deals with the semantic relations in the web, format feature, deals with format of the content and title feature, deals with the web tittle. The above listed feature produces some model parameters, which is given as the input to the deep learning algorithm. The experimental analysis showed that, the proposed approach is efficient in web content extraction. The average precision, recall and f-measure values are updated as 83.875%, 78.3% and 80.83% respectively.
منابع مشابه
Efficient Method Based on Combination of Deep Learning Models for Sentiment Analysis of Text
People's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has ...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملAnomaly-based Web Attack Detection: The Application of Deep Neural Network Seq2Seq With Attention Mechanism
Today, the use of the Internet and Internet sites has been an integrated part of the people’s lives, and most activities and important data are in the Internet websites. Thus, attempts to intrude into these websites have grown exponentially. Intrusion detection systems (IDS) of web attacks are an approach to protect users. But, these systems are suffering from such drawbacks as low accuracy in ...
متن کاملبهبود مدل تفکیککننده منیفلدهای غیرخطی بهمنظور بازشناسی چهره با یک تصویر از هر فرد
Manifold learning is a dimension reduction method for extracting nonlinear structures of high-dimensional data. Many methods have been introduced for this purpose. Most of these methods usually extract a global manifold for data. However, in many real-world problems, there is not only one global manifold, but also additional information about the objects is shared by a large number of manifolds...
متن کاملOptimizing Membership Functions using Learning Automata for Fuzzy Association Rule Mining
The Transactions in web data often consist of quantitative data, suggesting that fuzzy set theory can be used to represent such data. The time spent by users on each web page is one type of web data, was regarded as a trapezoidal membership function (TMF) and can be used to evaluate user browsing behavior. The quality of mining fuzzy association rules depends on membership functions and since t...
متن کامل